27 research outputs found

    Towards an Unsupervised Method for Network Anomaly Detection in Large Datasets

    Get PDF
    In this paper, we present an effective tree based subspace clustering technique (TreeCLUSS) for finding clusters in network intrusion data and for detecting known as well as unknown attacks without using any labelled traffic or signatures or training. To establish its effectiveness in finding the appropriate number of clusters, we perform a cluster stability analysis. We also introduce an effective cluster labelling technique (CLUSSLab) to label each cluster based on the stable cluster set obtained from TreeCLUSS. CLUSSLab is a multi-objective technique that employs an ensemble approach for labelling each stable cluster generated by TreeCLUSS to achieve high detection rate. We also introduce an effective unsupervised feature clustering technique to identify the dominating feature set from each cluster. We evaluate the performance of both TreeCLUSS and CLUSSLab using several real world intrusion datasets to identify known as well as unknown attacks and find that results are excellent

    An incremental clustering of gene expression data

    Get PDF
    Abstract-This paper presents an incremental clustering algorithm based on DGC, a density-based algorithm we developed earlier [1]. We experimented with real-life datasets and both methods perform satisfactorily. The methods have been compared with some well-known clustering algorithms and they perform well in terms of z-score cluster validity measure

    Simulating Human Tasks Using Simple Natural Language Instructions

    Get PDF
    We report a simple natural language interface to a human task simulation system that graphically displays the performance of goal-directed tasks by an agent in a workspace. The inputs to the system are simple natural language commands requiring achievement of spatial relationships among objects in the workspace. To animate the behaviors denoted by instructions, a semantics of action verbs and locative expressions is devised in terms of physically based components, in particular geometric or spatial relations among the relevant objects. To generate human body motions to achieve such geometric goals, motion strategies and a planner that used them are devised. The basic idea for the motion strategies is to use commonsensical geometric relationships to determine appropriate body motions. Motion strategies for a given goal specify possibly overlapping subgoals of the relevant body parts in such a way achieving the subgoals makes the goals achieved without collision with objects in the workspace. A motion plan generated using the motion strategies is basically a chart of temporally overlapping goal conditions of the relevant body parts. This motion plan is animated by sending it to a motion human controller, which incrementally finds joint angles of the agent\u27s body that satisfy the goal conditions in the motion plan, and display the body\u27s configurations determined by the joint angles

    A rough set-based effective rule generation method for classification with an application in intrusion detection

    Get PDF
    Abstract: In this paper, we use Rough Set Theory (RST) to address the important problem of generating decision rules for data mining. In particular, we propose a rough set-based approach to mine rules from inconsistent data. It computes the lower and upper approximations for each concept, and then builds concise classification rules for each concept satisfying required classification accuracy. Estimating lower and upper approximations substantially reduces the computational complexity of the algorithm. We use UCI ML Repository data sets to test and validate the approach. We also use our approach on network intrusion data sets captured using our local network from network flows. The results show that our approach produces effective and minimal rules and provides satisfactory accuracy. Keywords: rough set; LEM2; inconsistency; minimal; redundant; PCS; intrusion detection; network flow data. Reference to this paper should be made as follows: Gogoi, P., Bhattacharyya, D.K. and Kalita, J.K. (2013) 'A rough set-based effective rule generation method for classification with an application in intrusion detection', Int

    Cutting Plane Training for Linear Support Vector Machines

    No full text

    PARSING AND INTERPRETATION IN THE MINIMALIST PARADIGM

    No full text
    In this paper, we discuss how recent theoretical linguistic research focusing on the Minimalist Program (MP)(Cho95, Mar95, Zwa94)can be used to guide the parsing of a useful range of natural language sentences and the building of a logical representation in a principles-based manner. We discuss the components of the MP and give an example derivation. We then propose parsing algorithms that recreate the derivation structure starting with a lexicon and the surface form of a sentence. Given the approximated derivation structure, MP principles are applied to generate a logical form, which leads to linguistically based algorithms for determining possible meanings for sentences that are ambiguous due to quantifier scope. Key words: Natural language understanding, Minimalist Program, Principles-based parsing. 1

    Cobi: pattern based co-regulated biclustering of gene expression data.

    No full text
    Abstract Co-regulation is a common phenomenon in gene expression

    A Preview on Subspace Clustering of High Dimensional Data Council for Innovative Research

    No full text
    Abstract: When clustering high dimensional data, traditional clustering methods are found to be lacking since they consider all of the dimensions of the dataset in discovering clusters whereas only some of the dimensions are relevant. This may give rise to subspaces within the dataset where clusters may be found. Using feature selection, we can remove irrelevant and redundant dimensions by analyzing the entire dataset. The problem of automatically identifying clusters that exist in multiple and maybe overlapping subspaces of high dimensional data, allowing better clustering of the data points, is known as Subspace Clustering. There are two major approaches to subspace clustering based on search strategy. Top-down algorithms find an initial clustering in the full set of dimensions and evaluate the subspaces of each cluster, iteratively improving the results. Bottom-up approaches start from finding low dimensional dense regions, and then use them to form clusters. Based on a survey on subspace clustering, we identify the challenges and issues involved with clustering gene expression data
    corecore